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ABSTRACT 

We demonstrate that two approximations to the \ 2 statistic as popularly employed by observational 
astronomers for fitting Poisson-distributed data can give rise to intrinsically biased model parameter 
estimates, even in the high counts regime, unless care is taken over the parameterization of the problem. 
For a small number of problems, previous studies have shown that the fractional bias introduced by 
these approximations is often small when the counts are high. However, we show that for a broad class 
of problem, unless the number of data bins is far smaller than \/N c , where N c is the total number 
of counts in the dataset, the bias will sti ll like ly be comparable to, or even exceed, the statistical 



error. Conversely, we find that fits using Cash's C-statistic give comparatively unbiased parameter 
estimates when the counts are high. Taking into account their well-known problems in the low count 
regime, we conclude that these approximate \ 2 methods should not routinely be used for fitting an 
arbitrary, parameterized model to Poisson-distributed data, irrespective of the number of counts per 
bin, and instead the C-statistic should be adopted. We discuss several practical aspects of using 
the C-statistic in modelling real data. Wc illustrate the bias for two specific problems — measuring 
the count-rate from a lightcurve and obtaining the temperature of a thermal plasma from its X-ray 
spectrum measured with the Chandra X-ray observatory. In the context of X-ray astronomy, we argue 
the bias could give rise to systematically mis-calibrated satellites and a ^5-10% shift in galaxy cluster 
scaling relations. 

Subject headings: methods: statistical — methods: data analysis — X-rays: galaxies: clusters — X-rays: 
general 



1. INTRODUCTION 

When faced with the problem of fitting a parameter- 
ized model to Poisson-distributed data, observational as- 
tronomers typically adopt one of two approaches. First, 
the maximum likelihood method involves varying the 
model parameters until the probability density function 
of the data given the model is maximal. In practice, ob- 
ser yers t y pically minimize a statistic such as C, defined 
by jCaslj ( |l979| ) which, in a slightly modified form (as 
implemented i n the astronom ical X-ray spectral-fitting 
pack age Xapcc; [/Vrnaud ■ 1006 ), can be written 



where the d and m subscripts indicate whether the data 
or the model are used as weights. In the literature these 
two weighting choices are sometimes referred to as "Ney- 
man's" and "Pearson's" , respectively. The shortcomings 
of these approximations a re wel l- docum ented when there 
are few counts per bin. Cash| ( |1979 ) pointed out that 



deviations from Gaussianity make such approximations 
inaccurate when the counts per bin fall below ~10-20, 
and various authors have quantified how the best-fitting 
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cific models become biased below this limit (e.g. 
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C = 2 ^ Mi - Di + Di log Di - Di log M z 
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2007). A number of other approxi- 



whers Di is the number of detected counts in the i th 
data-bin, Aij = Aij(j>i, . . . ,pk) is the model being fitted, 
and pi, . .. ,pi, are the model parameters. 

Since the absolute value of the C-statistic cannot be 
directly interpreted as a goodness-of-fit indicator, ob- 
servers typically prefer in stead to minimize th e better- 
known x 2 fit statistic (e.g. Lampton et al. 1976| ). As that 
statistic is strictly only defined for Gaussian-distributed 
data, observers generally approximate the true \ 2 by a 
data-based summation of the form 

(Mi - Di) 2 



(e.g. Whcaton et al. 199 



je en proposed to mitigate th i s effect 



_ Kearns et al. 1995; Chura- 
zov et al.| [l99(i ; Mighell 1999 ). In contrast, at least for 



some problems, fits using the C-statistic are found to be 
far less biased for low counts data 
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When the number of counts per bin exceeds ~15— 
20, the deviations from Gaussianity become less severe. 
Therefore, it is common practice in observational astron- 
omy to assume that, in such cases, Xd anc ^ Xm sufficiently 
well approximate the true \ 2 ancl that the model parame- 
ters for an arbitrarily parameterized model that minimize 
those statistics are relatively unbiased estimates of the 
true parameter values. The meaning of "relatively" here 
depends on context; for most observers a non-negligible 
bias would be acceptable provided it does not lead to the 
wrong scientific conclusions. This pragmatic approach to 
statistical inference is common in the observational liter- 
ature, but differs from the more rigorous methods gen- 
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erally preferred among statisticians. Nevertheless, when 
employing any approximation, it should be contingent 
upon the observer to assess whether it could potentially 
lead to wrong conclusions. Unfortunately, this is seldom 
done, and approximations such as Xd 01 Xm are often 
used without comment for a given problem. 

For the simple problem of measuring the count-rate of 
a (non-varying) source given its lightcurve, a number of 
authors have assessed the accuracy of using the x\ ano - 
X m approximations. As the count-rate becomes large, 
the fitted count-rate which minimizes x\ i s asymptot- 
ically found to underestimate the true rate by ~ r _1 
count s _1 , while similar fits using x m overestimate it by 
0.5t -1 co unt s -1 , where r is the duration (in seconds) 



of each bin ( Whcaton ct al. 


1995; 


Jading & Riisager 


1996 


Mighcll 


1999 




dauschild & Jentsche] 


pool 


). This can be 



understood as arising from the misparamctcrization of 
the problem; when one puts Mi — pr, where p is the 
count-rate of the source, the dependence of the denomi- 
nator in Eqn |on p naturally leads to a bias. Similarly, 
the dependence of the denominator in Eqn ^ on the ob- 
served data also produces a systematic bias when mini- 
mizi ng y 2 with r espect to p ( Wheaton et al. 1995| : Jading 
& Ri isager 1996 ). Nonetheless, as the number of counts 
increases this corresponds to an increasingly small frac- 
tional bias. If one only requires to know the absolute 
count-rate to a given fractional accuracy, therefore, the 
use of Xd 01 Xm ma y be "good enough", provided the 
count rate is sufficiently high. 

In this paper, we point out that a more relevant quan- 
tity than the fractional bias for assessing the usefulness of 
the approximations used in fitting is /(,, the bias divided 
by the statistical error. For two very different physical 
problems, obtaining the count-rate from a lightcurve and 
obtaining the temperature of a thermal plasma from its 
X-ray spectrum, we compute fb for fits to realistic data 
which minimize x|> Xm an d C. For Xd an d xi fits, we 
find that fb can be of order unity, or even worse, even 
if the number of photons per bin far exceeds the nomi- 
nal ~20 counts. In contrast, for the C-statistic fits, we 
find \fb\ <Cl. We explain these results in terms of an ap- 
proximate, analytical expression for fb for each statistic, 
and show that fits of an arbitrary, parameterized model 
are, in general, far less biased when the C-statistic is em- 
ployed than Xd or Xmi unless the model parameterization 
is chosen carefully. Finally, we discuss the possible sci- 
entific impact of the bias, as well as the advantages and 
practical implementation of using the C-statistic instead 
for data-modelling. We stress that we are not, in this 
paper, attempting a formal, statistical assessment of the 
validity of using x 2 methods in general to model any par- 
ticular problem, but rather we are asking whether the 
current approximate x 2 methods for Poisson-distributed 
data that are widely employed by observers are useful (in 
the sense \fb\ <Cl). 

2. THE BIAS 

In this section, we investigate two very different prob- 
lems, specifically the linear problem of obtaining the 
count-rate of a (non-variable) source from its lightcurve 
and the highly nonlinear problem of obtaining the tem- 
perature of a thermal plasma from its X-ray spectrum. 
We use Monte Carlo simulations to measure fb as a func- 
tion of the "true" parameter value, the number of counts 
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Fig. 1. — fb for the lightcurve model, as a function of total 
counts N c in the lightcurve. Circles denote /{, obtained with the 
\ 2 d statistic, triangles are for \m f^s, and stars are for C-statistic 
minimization. Results are shown for 50, 100, 500 and 1000 counts 
per bin (solid lines, dashed lines, dot-dash lines and dotted lines, re- 
spectively). The lines are the analytical approximations discussed 
in § |. 

in each dataset and the adopted binning. 

2.1. Lightcurve 

We first considered the problem M i0 = p T, where M i0 
is the true model value in the i th bin, r is the binsize of 
the lightcurve and po the true count-rate of the source. 
We sought an estimate of po by fitting a model of the 
form Mi = pr to the data. For each of a range of differ- 
ent values of po (50, 100, 500 and 1000 count s _1 ) and 
N c , we simulated a set of 1000 lightcurves with r = 1 s, 
assuming that the total counts per bin were Poisson dis- 
tributed about pqt. For each simulated lightcurve we 
used customized software built around the MINUIT soft- 
ware library 2 to obtain the value of p which minimized 
each statistic {x% Xm an( i C). The mean and standard 
deviation of the best-fitting p values were measured for 
each (j>o,N c ) pair and statistic choice, allowing fb to be 
computed. In Fig [l], we show how fb varies as a func- 
tion of N c , the total counts in the lightcurve, and the 
count-rate of the source. 

As is clear from Fig at fixed count-rate and bin- 
size, the statistical importance of the Xd an< ^ Xm raas 
is an increasing function of the number of counts in 
the lightcurve; indeed it rapidly becomes very large as 
the number of data-bins gets large. This is simply be- 
cause the absolute value of the bias is a pproximately con- 
stant as the count-rate becomes large flJading fc Riisager 
1996), whereas the statistical error is a d ecreasing func- 
tion of N c . In stark contrast, for the Cash C-statistic fits, 
we find \fb\ -C 1; in fact the bias using the C-statistic 
is exactly zero here. This can be seen by substituting 
ALi = PT into Eqn |l] and analytically minimizing C, which 
leads to p — Di/ t, the expectation of which is po- 

2.2. Thermal plasma 

We next consider the case of an X-ray emitting, op- 
tically thin, collisionally ionized thermal astrophysical 

2 http:/ /lcgapp.cern.ch/project/cls/work- 

packages / mathlibs / minuit / index, html 
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Fig. 2. — /{, for the recovered temperature of a thermal plasma from its Chandra X-ray spectrum, as a function of total counts N c in 
the spectrum, for a thermal plasma with zero metal abundance (Zp e = 0) and for a Solar abundance plasma (Zp e = 1). The data-points 
represent the results of the Monte Carlo simulations (see text), and the error-bars on are all ~0.03. Circles denote /& for the x\ fits, 
triangles indicate Xm fits, and stars indicate C-statistic minimization. The temperature of the thermal plasma is indicated by the style of 
the line joining the data-points, with solid, dashed, dot-dash and dotted lines indicating a plasma with kT=l, 3, 5 and 7 keV, respectively. 



plasma, the X-ray spectrum of which is dominated by 
thermal bremsstrahlung plus line emission. Using the 
Xspec spectral-fitting package we simulated and fitted 
spectra which might be observed with the ACIS-I instru- 
ment aboard the Chandra X-ray observatory. Since we 
considered the high count limit, we did not include any 
background in the simulation s. For the source m odel we 
used a zero redshift APEC ( |Smith et af] |200lD plasma 
model modified by linc-of-sight absorption due to the 
cold Galactic ISM ( [Balucinska-Church fc McCammon 
1992). We assumed an absorption hydrogen column- 



density of 10 cm , consistent with a high Galactic 
latitude pointing. The redistribution matrix (RMF) and 
effective area (ARF) files (which map the physical source 
model onto the binned data taken by the detector) were 
created for a near-aimpoint position in a representative 
ACIS-I observation. 

Using the "fakeit" command in Xspec we simulated 
sets of 1000 spectra for different combinations of temper- 
ature, metal abundance and total counts per spectrum. 
This procedure creates data in a set of pre-defined bins 
by drawing a random number from a Poisson distribu- 
tion with intrinsic mean Mio, i.e. the expected counts 
predicted by the model. We have verified that we ob- 
tain consistent results with our own software. We chose 
input temperatures of 1, 3, 5 and 7 keV/k respectively 
and each model has heavy element abundances relative 
to h ydrogen set either to zero , or to match the Solar val- 
ues ( |Grevesse fe SauvaJ 199£ ) . We considered data only 
in the 0.5-7.0 keV range, and simulated spectra with a 
range of N c spanning 10 3 to 10 6 , In each spectrum, the 
simulated data points were regrouped to ensure at least 
20 photons per bin. We fitted each simulated spectrum 
while allowing only the temperature and normalization 
to vary, separately using x% Xm an d the C-statistic, all 
of which are implemented as standard in Xspec. We show 
fb obtained with each statistic as a function of temper- 
ature, heavy element abundance and N c in Fig ^|. In 
contrast to the x\ an d Xm ^s, for which \fb\ ~ 0.5-1 it 
is immediately apparent that the C-statistic results are 
practially unbiased. 



3. DISCUSSION 

For the two very different problems discussed in § |^, 
we find that, using realistic data, \fb\ <C 1 only for the 
fits using the C-statistic while the best-fitting parameters 
obtained by minimizing \ 2 d and x m were significantly bi- 
ased (fb of order unity). In order to explain these results, 
in the Appendices we derive an approximate analytical 
expression for the order of magnitude of fb given an ar- 
bitrarily parameterized model. For fits using Xd.i wc nn( i 
/b~ TN/ \/Nq, where N is the number of data-bins and 
N c the number of counts in the data-set. Alternatively, 
fits using Xm were biased in the opposite sense, yield- 
ing /t~ ±0.5iV/ \fW c . This is true even in cases where 
the number of counts far exceeds the canonical 20 per 
bin required for deviations fr om Gaussianitv t o be u nim- 
portant. As pointed out by Wheaton et al. ( 1995 ), the 
bias arises not from deviations from Gaussianity but be- 
cause of the misparameterization of the problem when 
these approximations are used with an arbitrary model. 
In contrast, those fits employing the C-statistic typically 
should have | fb | <C 1 . We show these order of magnitude 
estimates for the lightcurve problem as the various lines 
in Fig |J, revealing excellent agreement with the results 
of our simulations 3 . 

The values of fb obtained for the spectral-fitting prob- 
lem (Fig ||) are also easily understood in terms of these 
order of magnitude estimates. In the regime of relatively 
few counts (~ 1000 per spectrum), the statistical errors 
can be quite large (e.g. ±3 keV for the 7 keV plasma) 
and hence fb was small for all the statistics. For the cases 
with more counts the error-bars were small enough that 
the truncated Taylor expansion used in the Appendices is 
approximately valid. Considering a typical 7 keV plasma 

with N c = 10 5 , N is -400, implying f b 1.3 for X \ fits > 

which is close to the observed value. As N c falls, so too 
does N since more data-bins need to be grouped together 
to ensure at least 20 counts in each. This can more than 
offset the fall in N c and prevents fb from growing much 

3 In fact, for this problem, these estimates are almost exact, as 
can be seen by substituing the Mi = pr into the derivations in the 
appendices. 
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larger. In contrast, as N c gets even larger, there are few 
bins at the original instrument resolution which contain 
fewer than 20 counts (i.e. that need to be regrouped) and 
so N grows only slightly from 10 5 to 10 6 counts. Thus 
fb starts to fall as N c gets very large, as seen in Fig ||. 
A similar argument explains the trend of fb with N c for 
the xL nts - 

3.1. Removing the bias 

We have shown that, for fitting Poisson-distributed 
data with an arbitrary, parameterized model even in a 
fairly high-counts regime, the routine use of the Xd an d 
Xm approximations to the true x 2 is likely to give rise to 
biases in the best-fitting parameters which can be of or- 
der the statistical error, or even larger. We argue, there- 
fore, that the x\ ano - Xm approximations should generally 
be avoided for fitting Poisson-distributed data, unless the 
square root of the number of counts in the dataset far ex- 
ceeds the number of bins being fitted, or the model pa- 
rameterization is chose n with care. In contrast, fits per- 
formed using the Cash C-statistic yield estimates which 
are, to all practical purposes, unbiased in the regimes we 
have discussed in this paper and we, therefore, strongly 
recommend its use instead. 

The major objection to the widespread uptake of the 
C-statistic for model-fitting is that the statistic itself can- 
not be directly interpreted as a goodness-of-fit indicator 
in a similar fashion to the (true) x 2 statistic. In or- 
der to test the hypothesis that the data are consistent 
with the (best-fitting) model, therefore one must adopt 
an alternative strategy. Arguably the most robust tech- 
nique 4 is a fairly costly Monte Carlo approach, for ex- 
ample that implemented as the "goodness" command in 
Xspec. On each simulation, an artificial dataset is gen- 
erated by adding Poisson-noise to the best-fitting model, 
and the artificial data are fitted. The fraction of simu- 
lations which yield a best-fitting statistic value which is 
more negative (i.e. a better fit) than the best-fit statis- 
tic for the real data is an estimate of the significance at 
which the null hypothesis can be rejected. We note that 
the distribution of the best-fitting parameter values from 
these simulations can be used at minimal extra compu- 
tational cost to derive a confidence interval for each pa - 
rameter (e.g. Humphrey et al. 2006; Buote et al. 2003), 
as well as providing a direct assessment of the magni- 
tude of any residual bias. In the case where the num- 
ber of fitted parameters becomes large, this Monte Carlo 
method of error-bar estimation is far more efficient than 
the more u sual proced ure of stepping through parameter 
space (e.g. Cash 1979 ). 

While it is not strictly necessary to bin the data in or- 
der to fit a model with the C-statistic, the choice of bin- 
ning is critical for i nterpreting the goodness-of-fit (e.g 



Hclsdon et al 



2005| ). The reason is that the statistic 



is defined only locally, in the sense that it contains no 
information about the relative ordering of the residuals 
between data and model. To illustrate this point, con- 
sider testing a lightcurve with the model Mi = pr. Let 
the data be sufficiently sparsely binned that the number 
of counts in bin i, Di can only equal or 1, and further 



4 For example, the method outlined by Raker He Cousins -( 1 98.1 1 

hn arrnratn in all fnnnt vncrimpa ( H angrhi IH Xr Tnnlgrhnl 



2001) 



let all of the nonzero data-points be in the second half of 
the lightcurve (which clearly has only a ~ 2~ Nc chance 
of occurring randomly, if the model is correct). Substi- 
tuting the best-fitting value (p = N c /Nt) into Eqn |l|, it 
is clear that 



C = 2'Y J D l logD i 



2N c log 



Nc 
N 



= -2NJog 



N 



i.e. C depends only on the number of counts in the 
lightcurve, and not their relative order. On each Monte 
Carlo simulation we generate an artificial lightcurve from 
the best-fitting model, so clearly approximately half will 
have more than N c counts in total, and half will have 
fewer. Provided N c /N <C exp(— 1), which must be true 
in this case, C varies monotonically with N c and so the 
estimated null hypothesis probability will be 0.5 (i.e. a 
"good fit"). Alternatively, one can rebin the data into 
two equally-sized bins (one containing counts and one 
N c ), in which case the ^ DJogDi term is no longer 
and the test has greater power to distinguish between the 
model and the data. Based on Monte Carlo simulations, 
the model will be rejected at better than 99.9% signifi- 
cance provided iV c >8. It is worth noting, however, that 
increasing the binning is not always helpful; if we were 
to bin the data even more heavily (into a single bin), we 
would wash out the information which allows us to dis- 
tinguish between the model and the data. In the case 
that the data are inconsistent with the model, the null 
hypothesis probability is almost always a strong function 
of the adopted binning. 

It is important to appreciate that the dependence of 
the null hypothesis probability on the binning of the data 
is by no means limited to uses of the C-statistic, since x 2 
(which also contains no information about the grouping 
qf the residual s) suffers from exactly the same problem 
jGumbei 1943 ). In practice, the appropriate binning to 
use is that which maximizes the difference between the 
data and the model, which likely depends on the pre- 
cise model being fitted and may involve some experi- 
mentation. Choosing to adopt the Xm approximation on 
the grounds that it is "easily interpretable" for an ad 
hoc binning scheme is clearly something of a false econ- 
omy, especially coupled with the intrinsic bias which can 
arise when it is used. The problem is exacerbated for the 
y 2 statistic, which is only ap proximately x 2 distributed 
Pauschild fc Jcntschcl|[200l| ). 

Our present discussion does not consider the potential 
impact of background uncertainti es (which can i ntroduce 
additional systematic errors; e.g. Liu ct al. 2008), nor the 
case of very few counts per bin. In these circumstances it 
is possible that bias may remain on best-fitting param- 
eters recovered from C-statistic fitting, or its variant in 
the Xspec package which takes account of dire ct back- 
ground subtraction (Leccardi & Molcndi 2007). A full 
assessment of such putative effects needs to be carried 
out on a case-by-case basis, but is relatively straightfor- 
ward with the Monte Carlo method outlined above, and 
we will address some of these issues in a future paper 
jLiu ct al] fc008| ). 

Alternative approximations to x 2 have been proposed 
which are less biased in the case of very few counts per 
bin (where the bias is partially due to deviat ions from 
Gaussianity ) . In general t hese s chemes (e.g. Whcaton 
et al.| |1995| ; [Kearns et al| |1995| ; |Churazov et al.| |.H'H| 
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are not rigorously motivated and there is no good the- 
oretical reason to expect them to yield genuinely unbi- 
ased estimates for any given problem in the high counts 
case. Coupled with their lack of widespread use and 
the difficulty of assessing their performance analytically, 
we do not address them here other than to state that, 
aside from the ostensible transparency of the % 2 value 
(which, as stated above, can be deceptive), we see lit- 
tle compelling reason to use them in preference to the 
C-statistic. 

3.2. Scientific impact of the \ 2 bias 

The existence of the bias will undoubtedly have im- 
plications for the scientific conclusions of various studies 
which have adopted x\ or Xm approximations for fitting 
Poisson distributed data without assessing the limita- 
tions of these approximations in that context. In this 
section, we highlight a few cases of particular interest 
from the field of X-ray astronomy, in which x\ ^ s typi- 
cally adopted as a de facto standard (e.g. in Xspec). 

The in-flight inter-calibration of X-ray satellites can be 
assessed by comparing spectra l-fits of very bright , canon 



ical "calibration sources" (e.g. [Kirsch et aj_J|2005| [Plucin 



ky et al. 2008). Since different X-ray instruments have 



different numbers of spectral bins (e.g. typically < 500 
for the XMM PN and typically < 50 for the RossiXTE 
PCA) and since differences in exposure time and collect- 
ing area mean that there are widely varying numbers of 
photons in the calibration datasets, the absolute magni- 
tude of the bias is expected to vary from instrument to 
instrument. For realistic sources it can be of order a few 
percent or higher, which is competetive with the absolute 
target calibration of most instruments. Since calibration 
sources are generally very bright, the statistical errors on 
recovered parameters are typically very small, and hence 
we may see parameter spaces which do not overlap even 
if the satellites are perfectly inter-calibrated. 

X-ray studies of galaxy clusters and groups routinely 
involve the computation of gravitating mass profiles 
from the measured gas temperature and density pro- 
files (obtained from spatially-resolved spectr oscopy) and 
the e quatio n of hydrostatic equilibrium (e.g. Gastaldcllo 
|st al.| |2007D . Based on our simulations, and the argu- 
ments m — Appendix [A|, we expect roughly a 5-10% frac- 
tional bias on the temperature, which would translate 
into a similar bias on the mass, especially in the clus- 
ter regime. Errors of this magnitude are significant if 
clusters are to be used for precision cosmology measure- 
ments. As an example, the relation between a cluster's 
virial mass (M v i r ) and dark matter halo concentration 
(c), both of which are derived by fitting a canonical dark 
matter halo model (the NFW profile) to the measured 
mass profile, can be used to distinguish between cosmo- 
logical models. Clearly M V j r is likely to be underesti- 
mated due to the bias but the effect on c is harder to 
predict since it depends sensitively on the exact slope 
of the mass profile, which in turn depends on how the 
bias varies with radius. Still, if c is systematically bi- 
ased by as much as ~5%, as in our example below, that 



would be comparable to the current best statistical er- 
ror on the normalization of the c-M v ; r relation, which is 
the pri me discriminator b etween different cosmological 
models ( Buotc et al. 2007). 

To illustrate the bias on M v j r and c with real data, we 
have reduced and analysed high-quality Chandra data of 
a nearby, X-ray bright cluster, A 1991. We obtained 39 ks 
of data from the Chandra archive, which we processed 
to obtain the temperature , gas density and gravit ating 
mass profiles as outlined in Gastaldello et al. (2007). Us- 
ing the C-statistic we fitted the data in 9 radial bins 
with parameterized models for the gas temperature and 
density which, inserted into the equation of hydrostatic 
equilibrium, enabled us to obtain the mass profile and 
hence M vir and c. We found M vir = 2.60 ±0.19 x 10 13 M Q 
and c= 7.94 ± 0.47, which are broadly co nsistent with 
the measurements of Vikhlinin et al. ( 2006| ), who appar- 
ently used Xd- Refitting the data, this time using x% we 
found that the temperature was reduced by ~2% on av- 
erage and in individual bins it could change by as much 
as ^1-cr. This bias translated into a ^4% reduction in 
the resulting M v ; r and c, or a ~0.5-tr effect. The full 



details of this analysis will be given in Liu et al. ( 2008 ) . 

Another scaling relation which is key for understand- 
ing cluster physics is the relation between M V i r and the 
emission-weighted X-ray temperature of the gas, Tx- 
Both Tx and M v ; r are likely underestimated in most 
published studies (which generally use Xd)- Since the 
spectrum used to measure the temperature usually con- 
tains far more counts than any of the individual spectra 
used to determine the mass profile, the effect on M V j r is 
likely to be much larger. If this effect is as large as our 
estimated ~5-10%, it will not only exceed the current 
best statistical error on the normalization of the mea- 
sured relation, but it will also partially reduce the ~30% 
discrepancy in the normalization between the measured 
relation and the pr edictions of self-simi lar models of clus- 
ter formation (e.g. Arnaud et al. [2005 ). 

As a final illustration of the effects of the bias in real 
data an alysis, i n his X -ray study of the hot gas in galaxy 
groups, Buote ( 2000 ) estimated error-bars on the tem- 
perature and Fe abundance by a Monte Carlo procedure 
similar to that discussed in § 3.1. In a significant number 
of cases, the 1-tr error range inferred from the simulations 
did not actually contain the best-fitting parameter (i.e. 
the bias was more than 1-cr), giving rise to error-bars 
which appeared distorted when plotted. 
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APPENDIX 
A. BIAS IN x 2 -FITTING 
Al. Data weighting 

We here derive an expression for the magnitude of the bias when fitting a parameterized model using the Xd 
approximation. We start by setting Mj = Mi(p), with the parameter p having a true value po and defining M^o = 
Mi(po). Starting with Eqn 0, differentiating with respect to p and setting the derivative equal to zero, we obtain: 



= 



dxl _ 2 dMj 
dp t-r 1 dp 



Mi -Pi 

Da 



Now, we write D t = M t0 + SD t and M l ~ M i0 + 5pM' iQ + 5p 2 M" /2 
and so on. Substituting these in and rearranging we obtain: 



where M' i0 



iQ 



E 



Sp 



12 

4" M l0 



Mm ^ \ M?_ M M 



X) 



SDi 



2M 



E 



2AL, 



5D t +J2 



If higher order terms can be ignored, this is just a quadratic equation of the form: 



(Al) 

dMi I dp evaluated at p = po , 



2^ \ Mf Q Ml 



Ml 



iQ 



SD 



(A2) 



o = + a * SD i + & p ( B + E b * 6D * 



b'SD 2 



°i 5D i 



E c ^i 



5p = 



Sp 2 {C _ _ 

\ i i 

2(C + £,c i( $A + £^Df) 



(B + Ei + b'JD 2 ) 2 - 4(E, aiSDi + E, a{5I>?)(C + E, ^ + E j c£MJ?) 



(A3) 



2(C + E i c l <5A + E,^ 2 ) 

(A4) 

where we only keep the solution consistent with Sp being small. Assuming \6Di\ <C M^, both the square root and the 
recipricol terms can be expanded as a power series in SDi. Writing only terms up to second order, we obtain: 



Spc 



1 



\ " 



B 4^ 



MA-^$>^£>?+E 



SDjSDj 
B 2 



B 



. M i0 ( Caf , 

^ < 5p>^^ T lb i a i -^-Ba' i 



(A5) 



where <. . . > denotes the expectation operator. We have used the distributive nature of the expectation operator and 
we have used the results < SDi >= and < SDiSDj >= 0, if i ^ j or = M M Hi — j, which are true for both Poisson 
and Gaussian distributions (provided the latter has a statistical error in bin i, <Ji = y/< Di >). 

In general, < Sp > will be nonzero. To estimate its magnitude it is helpful to define M' iQ = Miofl(po)/poi M" Q = 
Miofi(po)/Po and Mjo = N c rriio, where N c is the total number of counts in the dataset. Making these substitutions 
and rearranging we find that 



to* = §(/? + /;'), 
Po 



C 2 



and-^ = -^' 2 

2p 3 of' 2 



Pimm 



where j' 2 = Ei fl 2m io, i-e. the model- weighted average of f' 2 , and so on. We note that 



Pi 



dp 



dp 



dp 



(A6) 



(A7) 



and so, on average, ////' ~ // 3 for a broad class of problem, where the ~ symbol indicates similar orders of magnitude. 
Thus, on average &,aj ~ f^/po-, ~Ca 2 /B ~ ~fi 3 /Po and —Ba\ ~ —Nf^/pQ, where we have used l/m^o ~ N, the 
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number of data bins. Since N ^> 1 in general, it follows tha t the third of the parenthetical terms in Eqn 
larger than the other two. Keeping only that term, Eqn A5 becomes 



< Sp >: 



PqN 



Ei N$i 



f 



12 



is much 



(A8) 



To estimate fb, we adopt the s tatistical e rror obtained from fitting the C-statistic, which is expected to be close to 
that obtained with \ 2 methods (Cash 1979| ). As we show in Appendix^, to second order this is given by: 



< Sp 2 >: 



vl E* m !ofl 2 



pI 



(Ei^ 2 ) 2 ^(E^o// 2 ) 2 N c r 



We have assumed Ei m io/i 2 ~ f' 2 /N? , which is justified since to^q ~ 1/N. Thus we obtain: 



fb 



N 

Wc 



Ei N fi 



T- 



N 

W 



(A9) 



(A10) 



where we have assumed the term in square brackets is ~ ±1, that is the absolute value of the mean of f[ (averaged 
over the data set) is of the same order of magnitude as its (model-weighted) root mean square. This will likely be 
approximately true for an arbitrary model (although it should be verified in any particular case) unless one takes 
considerable care over choosing the particular parameterization of the model, in which case it may be possible to 
obtain fa close to zero. 

Strictly speaking, this derivation is only valid for single-parameter models. However, it is relatively straightforward 
to generalize it to the mul ti-p arameter case, which leads to a set of coupled quadratic equations (one per parameter) 
of a form similar to Eqn A3. This implies that the bias on the parameters, or at least some combination of the 
parameters, should be of a similar order to that derived above. 

A2. Model weighting 

For the case of model weighting, the problem is remarkably similar. Starting with Eqn ^ differentiating and rear- 
ranging, we obtain 



= 



d X 2 



dp 



^ dMi 
dp 



M 2 -m 

M? 



(AH) 

Using the same expansion methods we adopted for the data-weighting case, we obtain (ignoring all terms higher than 
second order): 



0: 



; E 



E 



2^ M2 0U i 



X) 



Ml 



SIX 



Sp 2 



Sp 



E 



E 



/3 



6MI 
Mf 



Mf 



M 



id 



V 2M% 

_ I 



E 



M, 



(0 



M, 



m 



SDi 



mi 

M 2 



3M, 



M 4 
IVI i0 



M? 



M: 



2M 2 



SDi 



(A12) 

which is a qua dratic in Sp, of the form discussed in the previous section. Therefore, the bias can be trivially computed 
from Eqn A5. Substituting for M i0 , M' iQ and M" , exactly as before, we obtain 



.Miff 

Po 



2/f), 



Co? 
B 



6/f / f' 3 - f'f" 



Po 



and — Ba[ 



I fl2 



mioPo 



(A13) 

Following the arguments used for the data-weighting case, it is clear that \Ba'j\ is much larger than the other terms, 
so 



r ^ IPON 



Ei jy_/j 

F 



1 N 
fb 2 



N r 



Ei Af/i 



4 



N 

W 



(A14) 



Note that the bias due on parameters recovered under the x\ approximation is —2 times the bias with xi, 



B. CASH C-STATISTIC BIAS AND ERROR 



We here esti mate t he magnitude of the bias and the statistical error we expect on the recovered parameter for the 



case where the [Cash C-statistic is used to fit the data. I n general, it is expected that parameters obtained from a 
maximum likelihood method have some level of bias (e.g. Ferguson 1982j ) but we here show that, for the C-statistic 
in the high counts regime, this bias is likely far smaller than the statistical error. We can approach this problem by 
essentially the same technique used in Appendix |A|. Differentiating Eq [I], setting it equal to and rearranging, we 
obtain: 



2— 1 dp 



Di 



(Bl) 



Using the expansion methods we adopted in Appendix |a|, we obtain the approximate expression: 



= -J2 M MSD t + Sp 



M' 2 



M, 



-Sp 2 



Ml 



(B2) 



If higher order terms can be ign ored, this is just a quadratic equation similar to that solved in Appendix |A|, but with 



b' i = c[ = 0. From Eqn A5 it is easy to show that only keeping terms up to second order 



< 5p 2 > ~ £ aiO-j < SDiSDj >= £ a 2 M l0 = 



(DO 2 



(B3) 



Now, in general the C-statistic fits are found to be far less biased tha n th ose using \\ or Xm- This can be shown 
by substituting the appropriate expressions for each of the terms in Eqn A5 and making the various substitutions for 
M M , M- and M" a outlined in Appendix [A|. We obtain: 



<S P >^- 



(E, m%f. 
Now, assuming 

Ei - f' 2 /^- 1 (see Appendix g), we obtain: 



(B4) 



< Sp >' 



Po_ 



f'f" 



(/«)' 



(B5) 



where we have allowed two terms of order /' 3 in the numerator of the bracketed expression to cancel; although they 
are unlikely to cancel completely we assume that they largely do so, making the /'/" term more important. Relaxing 
this assumption does not affect our conclusions. Adopting the order of magnitude estimate for the statistical error 
derived in Appendix we obtain 



'Nr. 



(B6) 



which is vanishingly small as iV c becomes large. We have assumed that the term in square brackets is of order unity. 
This can be justified because, as shown in Appendix [A|, /'/" ~ /' 3 which ~ (/ ,2 ) 3 / 2 for a broad range of problem. 
Although the accuracy of this assumption should be tested for any given problem, provided /'/" is not larger than 
(/ /2 ) 3 / 2 by a factor ~ N{^> 1), the parameters recovered from the C-statistic fit will be less biased than those using 
Xd or Xm- Finally, since typically /j<1 we are justified in assuming \J < dp 2 > is the I-cr statistical error on p. 
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